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Consider the normalized partial sums of a real- valued function F 
of a Markov chain, 

7-1-1 

</>n := n" 1 ^ J F($(fc)), n>l. 

k=0 

The chain {$(fc) : k > 0} takes values in a general state space X, with 
transition kernel P, and it is assumed that the Lyapunov drift condi- 
tion holds: PV <V-W + bI c where V : X -> (0, oo), W : X -> [1, oo), 
the set C is small and W dominates F. Under these assumptions, the 
following conclusions are obtained: 

1. It is known that this drift condition is equivalent to the existence 
of a unique invariant distribution 7r satisfying n(W) < oo, and the 
law of large numbers holds for any function F dominated by W: 

</>n —* <l> '■= it {F) , a.s., n — > oo. 

2. The lower error probability defined by P{<j}„ < c}, for c < 4>, n > 1, 
satisfies a large deviation limit theorem when the function F satisfies 
a monotonicity condition. Under additional minor conditions an exact 
large deviations expansion is obtained. 

3. If W is near-monotone, then control- variates are constructed based 
on the Lyapunov function V, providing a pair of estimators that to- 
gether satisfy nontrivial large asymptotics for the lower and upper 
error probabilities. 

In an application to simulation of queues it is shown that exact 
large deviation asymptotics are possible even when the estimator does 
not satisfy a central limit theorem. 
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1. Introduction. This paper explores extensions of the control-variate 
method to obtain confidence bounds in simulation of a function of a Markov 
chain <I> = {$(0), $(1), . . .}. It is assumed that $ evolves on a general state 
space X, equipped with a countably generated sigma-field B. The statistics 
of <I> are determined by its initial distribution, and the transition kernel P 
defined by 

P(x, A) := P{*(1) G A|$(0) = x}, xeX,A£B. 

Let {L n :n > 1} denote the sequence of empirical measures induced by $ 
on (X,B), 

n— 1 

(1) L n := - <5$( fc) , n > 1. 

n fc=0 

It is assumed that <E» is positive Harris recurrent, with unique invariant 
probability distribution denoted n. Equivalently, for each bounded measur- 
able function F : X — > R, and each initial condition, the law of large numbers 
holds: 

L n (F) ^ (j) := tt(F) = J F(x)ir(dx), n— >oo, a.s. 

See [19, 22] or [38], Theorem 17.1.7. For each measurable function F:X— >M 
satisfying vr(|F|) < oo, the sequence {L n (F) :n > 1} is interpreted as Monte 
Carlo estimates of the steady-state mean of F. 

While consistent for each initial condition even when F is not bounded, 
finer assumptions are required to obtain confidence bounds, that is, bounds 
on P{\L n (F) — (j)\ > a} for a given a > 0. Such bounds are typically based 
on one of the following limit theorems: 

The central limit theorem (CLT). For some a>0 and each initial 
condition, 

(2) V^[L n (F) -<t>] Aal, twoo, 

where X is a standard normal random variable, and the convergence is in 
distribution. 



The large deviation principle, or LDP. For a convex function 
J:R— >R+, and any nonempty open interval (co,cx) CK, 

lim n- 1 log P{L n (F) - <f> G (c , ci)} 

n — >oo 

(3) = lim n-hogPiKiF) - ^ e [c , Cl ]} 

min 1(c) 

ce[co,ci] 
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There has been tremendous research activity concerning large deviation 
properties of Markov chains following the pioneering work of Donsker and 
Varadhan [15, 46, 47]. The literature contains a broad range of possible 
conclusions under a correspondingly broad range of assumptions (see the 
monographs [11, 12]). 

The strongest conclusions are based on variants of the assumptions im- 
posed by Donsker and Varadhan in [13, 14, 15], that are essentially equivalent 
to compactness of the n-step transition operator, for some n > (see [45], 
Theorem 2.1, or [10], Lemma 3.4). Under these assumptions the LDP holds 
for the empirical distributions [13, 15, 16], and the limit (3) holds for a class 
of unbounded functions F:X^M. [3, 49]. These conclusions are refined in 
[34] where in particular precise limit theory is obtained, generalizing the 
expansions of Bahadur and Ranga Rao for the partial sums of independent 
random variables [2, 4, 11]. Similar results are obtained in [33] for bounded 
functions on X under geometric ergodicity alone. Explicit, finite-time bounds 
have been obtained for uniformly ergodic chains in [21, 32]. 

Although most of the theory is based on assumptions on the Markov 
chain that are far stronger than geometric ergodicity, these conditions can 
be relaxed to obtain a weaker "pinned LDP" [41, 42]. Lower bounds can be 
obtained under essentially irreducibility alone [8, 29, 48]. 

The function I:R — > R + U {oo} appearing in (3) has many possible rep- 
resentations. In the limit theory of [3, 34, 41, 42] and the bounds obtained 
in [8, 29], the rate function is expressed as the convex dual 



where the "pinned" log moment generating function is defined as 
(5) A(a) = lim n" 1 log EJexp(naL n (F))lI{$(n) eC}}, a€l, 

n — >oo 

with CcXa "small set" and v a "small measure" (see discussion in Sec- 
tion 2.1). Under the assumptions imposed in this aforementioned work, the 
limit (5) exists, though it may be infinite, and is independent of the partic- 
ular pair (C, f) chosen (see [43], and the review in Section 2.1). 

Sufficient as well as necessary conditions for the central limit theorem 
for Markov chains are presented in [20, 22, 23, 38, 43]. Much of the theory 
is based upon the fundamental kernel. Recall that a real-valued kernel P 
on X x B is viewed as a linear operator, acting on functions h : X — ► R and 
probability measures /ionB, via 



(4) 



1(c) = sup [ca — A(a)], 



eel 



(6) Ph(-)= j P(;dy)h(y) and /iP(-) = / »(dx)P(x,-). 

Jx Jx 
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Under appropriate assumptions on <&, the fundamental kernel can be ex- 
pressed for an appropriate class of functions F : X — ► R via 

oo 

(7) ZF = Y J {P k F-AF)). 

k=0 

The following bilinear and quadratic forms are defined for measurable func- 
tions F, G : X -> R, 

((F, G)) :=P(FG) - (PF)(PG), Q{F) := P(F 2 ) - (PF) 2 . 

Under appropriate conditions, the asymptotic variance given in (2) can be 
expressed cr 2 (F) = n(Q(ZF)) (see [38], Theorem 17.5.3, and Proposition 2.1 
below) . 

The purpose of the control-variate method is to reduce the variance of 
the standard estimator defined by 

(8) cf> n :=L n (F), n>l. 

Suppose that there is a 7r-integrable function H : X — > R with known mean. 
By normalization we can assume that tt(H) = 0, and L n (Fg) is an asymp- 
totically unbiased estimator of 4> f° r each SgR with Fg : = F — 9H. 
The asymptotic variance of the controlled estimator is given by 

a 2 (F e ) = 7T(Q(ZF e )) = k(((ZF, ZF)) - 20((ZF, ZH)) + 6 2 ((ZH, ZH))). 

Minimizing over 9 € R gives the estimator with minimal asymptotic variance, 

7t(((ZF,ZH))) 
n(((ZH,ZH))Y 

See [17, 18, 36, 40] for more details and background on the general control- 
variate method. 

An approach considered in [24, 25] is to consider functions of the form H = 
J — PJ, and choose J so that it approximates the solution F to Poisson' ' s 
equation, 

(9) PF = F - F + cf). 

The idea is that if J = F, then the resulting controlled estimator with 9 = 1 
has zero asymptotic variance. 

This approach has been successfully applied in queueing models by taking 
J equal to an associated fluid value function. The approach is provably effec- 
tive in simple models [25], and numerical examples show dramatic variance 
reduction for more complex networks [26, 27]. Some theory to help explain 
the results of [26] is developed in [37] based on large deviation limit results 
contained in [33]. 
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Fig. 1. Monte Carlo estimates of <j> := ir(F) with F{x) = e 0,lx for x £ Z+. The 
stochastic process <1? is an M/M/l queue initialized at zero, with load p — 0.9. After 
a transient period, the estimates are consistently larger than the steady-state mean of 
cj>=(l-peP)- 1 (l-p). 



For the sake of illustration consider the reflected random walk on R + , 
defined by the recursion 

(10) $(k + l) = [&(k) + D(k + l)]+, k>0, 

with = max(:z;,0) for x E M, and D i.i.d. Consider first the special case 
in which D has common marginal distribution, 

r)(k) — { ^' with probability a, 

\ — 1, with probability 1 — a. 

In this case $ is a discrete-time model of the M/M/l queue, and the state 
space is then restricted to X = Z+. It is assumed that a 6 (0, |) so that $ 
is a positive recurrent Markov chain on X. 

The invariant distribution tt is geometric, so there is little motivation 
to simulate. However, ignoring this issue momentarily, suppose we wish to 
estimate using simulation the steady-state mean of F(x) = e@ x for a given 
/3>0. 

Shown in Figure 1 are Monte Carlo estimates of the steady-state mean, 

0:=^7r(W) = (l-p)E^' 

where p := a/(l — a). In this simulation j3 = 0.1 and a = 9/19, so that 
p = 9/10 and <j> = (1 - pe^) _1 (l - p) w 18.705. The Markov chain * was 
initialized at zero, $(0) = 0. The runlength in this simulation extended to 
T = 5 x 10 6 , yet the estimates are significantly larger than the steady-state 
mean over much of the run. The following proposition provides some expla- 
nation. A proof is provided in Section 3.2. 

The existence of a nontrivial LDP depends upon structure of the sublevel 
sets of F, defined by Cf{t) := {x : F(x) < r} for r > 1. This structure holds 
in Proposition l.l(ii) since the sublevel sets are finite for each r. 
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Proposition 1.1. Consider the M/M /I queue with p = a/ '(1 — a) < 1. 

(i) The Markov chain is geometrically ergodic, and its marginal distri- 
bution is geometric with parameter p. 

(ii) Consider the function F(x) = x or F(x) =eP x , x £ Z + , for some 
fixed [3 € (0, | log(p)|). The Monte Carlo estimates of the steady-state mean 
cf) := tt(F) are consistent, and there exists c<<f> such that the LDP (3) holds 
for any open set O C (c, oo), and each initial condition ^(0) = x £ 7L+. The 
convex rate function I: [c, oo) — > M + is strictly positive on \c,4>) and can be 
expressed as the convex dual (4). The rate function is identically zero on 
[</>, oo). Consequently, we have for each initial condition x E Z_|_, 

lim -log(P x {L n (F)<c})=-/(c)<0, cG(c^), 

n — >oo 77, 

and 

lim - log(P x {L n (F) > c}) = 0, c > <f>. 

n — >oo 77 

In this example the chain is geometrically ergodic, so an LDP bound 
might not be surprising since an exact LDP holds when F is bounded [33]. 
Section 3.2 contains a similar example in which analogous conclusions hold, 
yet is not geometrically ergodic, and {4> n } does not even satisfy the CLT. 
Moreover, in this example a control-variate is constructed to obtain a pair 
of estimators giving upper and lower confidence bounds. 

The remainder of the paper is organized as follows. Section 2 contains a 
statement and proof of the most important conclusion in this paper, The- 
orem 2.2, which establishes the LDP for a general class of functions on X. 
Section 2.1 contains a survey of spectral theory for Markov chains, following 
[3, 33, 34, 39]. A new criterion for the existence of a spectral gap is presented 
in Section 2.2, which is the main ingredient in the proof of Theorem 2.2. 

Applications of Theorem 2.2 to the construction and analysis of control- 
variates are contained in Section 3.1. The simulation algorithm proposed in 
Section 3.1 is shown to satisfy exact upper and lower LDP bounds. This 
result is illustrated in Section 3.2 using the reflected random walk (10). 
Conclusions are contained in Section 4. 

2. One-sided large deviation asymptotics. Throughout the paper it is 
assumed that $ is positive Harris recurrent and aperiodic. Equivalently, 
there is a unique invariant probability distribution ir on B such that, for any 
Ag B satisfying tt(A) > 0, and any initial condition x, 

lim \\P k (x,-) -tt(-)|| =0, xGX, 
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where || • || denotes the total-variation norm ([38], Theorem 13.0.1). We 
denote by B + the set of A G B satisfying n(A) > 0. We write / G B + if 
/ : X — > R + is a measurable function with J f dir > 0. 

A measurable function s : X — > ]R + and a probability measure v on B are 
called small if for some n > 1 we have 

(11) P n (x,A) > 8 (x)v{A), xeX,AeB. 

A set C is called small if s = elc is a small function for some positive e. 

The following Lyapunov drift condition is assumed throughout the pa- 
per. Given any measurable function F : X — ► M satisfying vr(|F|) < oo, we 
can construct a solution to (V3) with W = 1 + \F\ by applying [38], Theo- 
rem 14.2.3. The set Cy on which V is finite is absorbing, so that the chain 
can be restricted to this set; see [38], Proposition 4.2.3. 

{For a function W : X — > [1, oo) , 
a small set C C X, and a constant 6 < oo, 
Py<y-W + 6I C onCy :={x:y(x) <oo}. 
For a given function W : X — > [1, oo) the weighted Loo-norm is defined as 
\\h\\ w --=swp\h(x)\/W(x), 

x 

and denotes the set of all measurable functions h : X — > K for which this 
norm is finite (see [28, 30, 31, 33, 34, 38]). The supremum norm || • is 
precisely || • \\w with W = 1. Two functions W, W':X— > [l,oo) are called 
equivalent if they generate the same function space, that is, 

W'^lZ and WeL^'. 

The set of finite measures on B is denoted A4; the set .Mi C M. denotes 
probability measures on B; M. w C M. denotes measures satisfying fi(W) < 
oo; and =M W C\Mx. 

The convergence results in parts (i) and (ii) of Proposition 2.1 are con- 
tained in the /-norm ergodic theorem (Theorem 14.0.1) of [38]. The inter- 
pretation of the sum as a version of the fundamental kernel is contained in 
[38], Theorem 17.4.2. 

Part (iii) follows from [38], Theorem 17.4.4, and (iv) is contained in [38], 
Theorem 16.0.1. 

Proposition 2.1. Suppose that $ is ^-irreducible and aperiodic and 
(V3) holds with V everywhere finite. Then: 

(i) The chain is positive Harris recurrent with ir(W) < oo, and we have 

lim \\P k (x,-) - tt(-)||w = 0, xeX. 

k— »oo 

Moreover, the fundamental kernel Z : — > exists as a bounded linear 
operator. 
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(ii) If ir(V) < oo, then the chain is called W -regular of degree 2. In 
this case the fundamental kernel can be expressed as the sum (7). The sum 
converges in the induced operator norm from to . 

(iii) Ifn(WV) < oo, then the CLT (2) holds for each F E L™. 

(iv) If W > £qV for some e$ > then we say that (V4) holds. In this 
case $ is V -uniformly ergodic: for some constants &o < r > 1; 

oo 

£ r fc ||P fc (rE, •) - tt(.)||v < b Q (V(x) + 1), x E X. 
fc=0 

We list below some other definitions for a given measurable function 
F : X — ► R: The function is called 

Degenerate if there is a measurable function H : X — ► K such that when 
$(0) ~ 7r, F($(fc + 1)) - F($(fc)) = H($(k)) a.s. for fc > 0. Under appropri- 
ate bounds on iJ this implies that the asymptotic variance of <f) n is equal 
to zero. A converse is provided in [34], Lemma 4.12, based on [33], Proposi- 
tion 2.4. 

Lattice if there are h > and < d < h, such that 
F(x) — d 

(12) is an integer, x E X. 

h 

If there exists a lattice function Fi such that F — Ft is degenerate, then 
F is called almost-lattice. Otherwise, F is called strongly nonlattice. 

Near-monotone if inf xg x F{x) > — oo, and the sublevel set Ci?(r) := {x E 
X : F{x) < r} is small or empty for each r < H-F+Hoo, where = max(i ? ,0) 
and || • | |oo denotes the supremum norm. 

Large deviation bounds are obtained in [3] for countable state-space chains 
under the assumption that F is near-monotone, and A(F) < ||-F||oo> where 
A(F) is defined in (5) using a = 1. These assumptions are far stronger than 
geometric ergodicity when F is unbounded. The results of [3] are strength- 
ened and generalized to general state-space chains and processes in [33, 34]. 

By restricting the range of c in (3) we can relax the geometric ergodicity 
assumption. The proof of Theorem 2.2 is included at the end of this section. 

The most important assumption in Theorem 2.2 is the constraint (13). 
To interpret this condition, consider first the countable state-space case. If 
V and F have finite sublevel sets and V is unbounded, then this condition is 
immediate since Cp(r) and Cy(ro) are each finite for r < ||-F||oo and ro < oo, 
and CV(ro) | X as ro | oo. 

For general state-space models the set Cy(ro) is always F -regular for 
any finite ro, and hence small, by Theorem 14.0.1 combined with Propo- 
sition 14.1.2 of [38]. In this way we can interpret (13) as simultaneously a 
relaxation and strengthening of the near-monotone condition. 
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Theorem 2.2. Suppose that (V3) holds with V everywhere finite, and 
that F £ is a nondegenerate function satisfying n(F) = 0. Suppose more- 
over that the sublevel set Ci?(r) satisfies for some r > (f> = 0, ro < oo ; 

(13) C F {r) c CV(r ). 

Then, there exists cq < (ft and a smooth convex function I : (cq,4>) — > (0,oo) 
such that: 

(i) The LDP (3) holds for each initial condition x £ X and eac/j c £ 
(co,0)- 

(ii) If F is strongly nonlattice, then for each c£ (cQ,(ft), there exists a 
bounded function g c :X^ (0,oo), such that for each initial condition x £ X, 

(14) Px / Ln(F) < c} ^ j^e-'M n^oo. 

The LDP asymptotics described in Theorem 2.2 are based on the spectral 
theory of a positive semigroup obtained from the function F to be simulated. 
The definitions presented in Section 2.1 are taken from [3, 33, 34, 38, 39, 43]. 

2.1. Positive semigroups. Consider now a positive kernel P on X x £>. It 
is assumed that the semigroup {P k : k > 0} is ip -irreducible, 

oo 

^P fe (x,^)>0, xeX,AeB + , 

and also aperiodic, 

liminf I{P k (x,A) > 0} = 1 for each x £ X, A £ B + . 

k— >oo 

For a ^-irreducible kernel there exists a function s £ B + , a probability mea- 
sure v on B, and no > 1 such that P n ° > s<S> v. The function s and measure 
v are called P-small, generalizing the definition for a probabilistic kernel P. 

Based on a given function h : X — ► (0, oo) we consider in Section 2.2 the 
two positive kernels, 

the scaled kernel: Ph := J^P, or equivalently, 

:=^(a?)P(x,A), x£X,^£^, 
the twisted kernel: Ph := Ip^PIh, or equivalently, 

as) h(x , A)= kE<0m, X£X , A£B , 

The twisted kernel is probabilistic, so that P/ l (x,X) = 1 for all x, provided 
Ph(x) < oo for all a; £ X. 
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For any ■(/'-irreducible and aperiodic semigroup, the generalized principal 
eigenvalue (g.p.e.) is denned as A = e A , where A G [—00,00] is the limit, 

(16) A:= lim n" 1 \og(uP n s). 

n— >oo 

The limit is independent of the particular small function s G B + and small 
measure v chosen. If A is finite, then there is an associated eigenfunction 
ft.:X— > (0, 00] satisfying h(x) < 00 a.e. [tp], and 

Ph < Xh. 

This is an equality provided P is X- recurrent, 

00 

X~ k vP k s = 00. 

k=0 

See [9, 33, 43] for further discussion. 

For a given weighting function v : X — ► [1, 00), the induced operator norm 
of P is 

(17) \\\P\\\ V := sup{^^ : h G L^, \\h\\ v + o}. 

The spectrum S(P) C C of P is the set of z G C such that the inverse [Iz — 
P] -1 does not exist as a bounded linear operator on L^. 

The spectral radius of the semigroup {-P*} is expressed £ = e a , where 

(18) S:= lim Ar^logdHP^y. 

k— >oo 

We say that P is v-uniform if the spectral radius £ is finite, and there exists 
h G /x G Ml, such that 

sup \\\[Iz- (P- /iigi/u)]" 1 !^ <oo. 

I*l>f 

When j) = l we drop the qualification and simply say that P is uniform. 

If the kernel is v-uniform, then it admits a spectral gap, and the gener- 
alized principal eigenvalue coincides with £. Moreover, the eigenfunction h 
satisfies h G L^, and the eigenfunction equation Ph = Xh holds ([34], Propo- 
sition 2.9). 

2.2. Multiplicative mean-ergodic theorem. The multiplicative mean-ergodic 
theorem contained in Theorem 2.3 is the basis of LDP asymptotics for the 
partial sums [3, 33, 34]. 

The proof of Theorem 2.3 is identical to the proof of Theorem 3.4 in [34]. 
The idea of the proof of (20) is as follows: The twisted kernel P^ is v-uniform, 
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where h = f is an eigenfunction and v := v/h, since the twisted kernel is 
simply a scaling and similarity transformation of Pf with f = e F . Since each 
of the twisted kernels is probabilistic, this implies that Ph is the transition 
kernel for a u-uniformly ergodic Markov chain (see [33], Corollary 4.7 or [34], 
Proposition 2.11 for finer results). The multiplicative mean-ergodic theorem 
is a consequence of this mean-ergodic theorem for the twisted chain, and the 
representation 

Pf(x,A) = E x [exp(nL n (F))I{$(n) £ A}], xeX,AeB. 
An explicit formula for the eigenfunction / is given in (28). 

Theorem 2.3. Suppose that F:X— >R is measurable; its g.p.e. A is 
finite; and that Pf is v -uniform, with f = e . Then, there exists a measure 
fi € M\ and a function f € satisfying the eigenfunction equations Pf = 
Xf , fiP = \fi, with normalization, 

(19) a(/) = A(X) = i, 

and these are the unique solutions. Moreover, the following multiplicative 
mean-ergodic theorem holds: For some bo > 0, b\ < oo and all x € X, n > 1, 

<6 ie - feon t;(x). 

In the series of results that follow we present sufficient conditions for 
uniformity of a scaled kernel. It will be convenient to consider a family 
of kernels {P a :a€ (0, 1]} where P a = Pf a is the scaled kernel defined with 
fa = e aF , and F : X — > R a given measurable function. A family of resolvent 
kernels is defined by 

oo 

(21) i? a :=^2- fc - 1 P a fe , ae(0,l], 

k=0 

and R denotes the kernel obtained when a = so that f a = 1. It can be 
shown that u-uniformity of R a is equivalent to ^-uniformity of P a when P 
is aperiodic and A a < 2, and in this case A a is the g.p.e. for P a if and only if 
7 a = (2 - A a ) _1 is the g.p.e. for R a . 

We assume throughout that the function F is normalized so that tt(F) = 
0. It then follows from the definition (16), Jensen's inequality and the mean- 
ergodic theorem that A a , 7 a G [1, oo] for each a £ R. 

Under the assumptions of Theorem 2.4 the kernel R a is a bounded linear 
operator on f° r a £ [0)1]) where v a = e aV . The first bound in (22) is 
analogous to Condition (V4) of [38]. These two bounds are equivalent to 
geometric ergodicity for the Markov chain in the special case F = 0. For 
general F this is not true, as we shall see in Theorem 2.6. 



(20) 









exp^ 



n— l 



EW)) 

.k=0 



nA 
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Theorem 2.4. Suppose that F:X^M is a given measurable function 
satisfying tt(\F\) < oo and n(F) = 0. Suppose that there exist Ai < 1, a func- 
tion v:X^ [l,oo] that is not everywhere infinite, a function s:X— >]R + , a 
probability distribution v on B and a constant b < oo satisfying the bounds 

PfV < Xiv + bs and 

(22) 

Ra> s® v for all < a < 1. 

Then, v(x) < oo a.e. [it], and there exists a £ (0,1) such that P a is v a - 
uniform for all a S (0, a). 

Proof. To show that R a is t> a -uniform we prove that |||G a |||i; a < oo, 
where G a denotes the potential kernel, 

oo 

(23) G a = Y J la k - l [Ra-S®v] k 1 

k=0 

and 7 a = (2 — A a ) _1 is the g.p.e. for R a . 

Jensen's inequality implies the following family of bounds: 

P a v a < (Xiv + bs) a < X a v a + aX a ^bs, < a < 1, 

where At := (Ai)* for any t € M. Moreover, the resolvent equation holds: 
PaRa = RaPa = 2i? a — I- This combined with the bound on P a v a gives 

(24) 2R a v a -v a = R a PaV a < R a [Kv a + aA a _i6s], 
which on rearranging terms implies the bound 

(25) R a v a < j a v a + ab a R a s, 

with % = (2 — A a ) _1 , and b a = &A a _i7 a . We evidently have 7 a < 1 and A a < 1, 
so that b a < b/Xi for a E [0, 1]. 

Define v' a = (1 + ab2)v a — ab 2 s where b 2 > b/X\ is fixed. This function is 
equivalent to v a , and from (25), 

R a v'a < (1 + ab 2 )[^ a v a + ab a R a s] - ab 2 R a s. 

For a > sufficiently small we have 62 > (1 + ab 2 )b/X\ ^ (1 + a b 2 )b a - Conse- 
quently, for such a, 

R a v' a < (1 + ab 2 )^ a v a = 7X + db 2 %S, 

and on subtracting the function v(v' a )s from each side this gives 

[R a - S® v]v' a < Jav' a ~ (v(v' a ) - ab 2 J a )s. 

Decreasing a still further we can assume that {v{v' a ) — ab 2 */ a ) > 0. We con- 
clude that there exists a > such that with 5 a '■= 1 — J a , 

[R a - s (g) v]v' a < v' a - S a v' a , ae(0,a]. 
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Iterating this bound gives 

n-l 

[Ra-S® U] n v' a <v' a -5 a Y / [Ra-S'S) vfv' a , n > 1, 

k=0 

and hence, Ylk=o[^a ~ s ® v ^ v 'a — v 'ai w hich implies the final bound, 

\\[Iz-(Ra-s®v)\-^\\ v , a <5~\ ae(0,a], \z\ > 1. 

This completes the proof that |||Cr tt |||„ < oo since 7 a > 1, and t> a is equivalent 
to v' a for a G (0, a]. □ 

The following result provides a simple criterion that guarantees the exis- 
tence of s,v satisfying the minorization condition in (22). 

Proposition 2.5. Suppose that (V3) holds, and that F G L^. Then, 
for each r > 1, fl£l, t/ie set C = Cy{r) := {x : V(x) < r} is small for the 
positive kernel P a . Moreover, we have the following uniform bounds: For 
each ao > 0, r > 1, there exist Eq > 0, uq>1 and a probability distribution 
vq€ Mi such that, 

P^ (x,A)>eoM A ), X ^C, ae[-o ,o ]. 

Proof. Fix a > 0, r > 1, set F (x) = a \\F\\ w W(x), x G X, and define 
P := I e -F P. This is simply the scaled kernel P/j with ft, = e~ F ° . A minoriza- 
tion condition obtained for the kernel P will imply the desired uniform 
bounds since P a > P for a G [— ao,«o]- 

For any Ae B and x G X we have by Jensen's inequality, 

expf- J2F ($(k)))l{$(n) G ,4}j )p™(x, A) 



P n (x,A) = ^P n (x,A)- 1 E 1 
(26) 



>e3cp|p n (x,il)- 1 E a 



fe=0 



n-l 



$>„(*(*)) I{$Hg^} 

fc=0 / 



>P n (x,yl). 



Under (V3) the following bound holds: < P n V < V + n& - pkw ■> so 

that 



Lfc=0 



<a \\F\\ w [V(x)+nb], xGX,n>l. 



Consequently, from (26), 

P n (x,A)>exp{-P n (x,Ay l a \\F\\ w [V(x) + nb]}P n (x,A), 

x G X,n > 1. 



(27) 
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This shows that the semigroup {P n : n > 1} is 7r-irreducible, in the sense 
that ^2 n P n {x, A) > for each x whenever n(A) > 0. Let A £ B + be any 
fixed small set for P; there is a probability distribution vq, e > and an 
integer m > 1 such that 

P m (x,B) > eu {B), xeA, BeB. 

Choose n > 1, 5 > such that P n (x, A) > 5 for x 6 C. This is possible since 
the set C is VF-regular, and hence small ([38], Theorem 14.2.3). It follows 
from (27) that 

P n (x,A) >5 r := expi-S^aoWFWwlr + nb]}5, ieC, 

and hence, 

P n+m (x,B) >5 r ev (B), xeC,BeB. 
This completes the proof with riQ = m + n and £q = 5 r e. □ 

Up to now it appears that uniformity is a tremendously strong assumption 
on the scaled kernel Pf since the implications of uniformity are so strong. 
However, under the assumptions of Theorem 2.2 we can establish uniformity 
of P a for a range of nonpositive a, even though $ is only positive Harris 
recurrent. 

Recall that P is called uniform if it is u-uniform with v = 1 . 

Theorem 2.6. Suppose that (V3) holds with V everywhere finite, and 
suppose that the function F £ satisfies (13) for some r > and ro < oo, 
i/rai/i <^> = ^(-F) = 0. Then, there exists a < swc/i i/iai: 

(i) P a is uniform for each a £ (a, 0). 

(ii) The eigenf unctions {f a : a € (a, 0)} C Lqo, normalized so that v(f a ) = 
1 /or some small measure v and each a, are uniformly bounded: 

sup f a (x) < oo. 

a<a<0 
x&X 

(hi) Define f' a := g-/ a /or a G (a,0), {/ a } normalized as in (ii). 

These functions are uniformly bounded in norm: 

SU P ll/allv< °- 
a<a<0 

(iv) A is convex and analytic on (a,0), and lim^o ^A(a) = 0. 

Define the twisted kernel by P a := where f a is an eigenfunction that 
exists for P a . We have noted prior to Theorem 2.3 that P a is the transition 
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kernel for a t; a -uniformly ergodic Markov chain when P a is u a -uniform. As 
in [33], Proposition 4.9 one can verify that each of the functions 

F« = ^tog(/o) = /«//a, ae(a,0), 

solves Poisson's equation for the corresponding twisted kernel, 

P a F a = F a -F + <j>(a), 

where 4>{a) = 4-K{a) is the steady-state mean of F for the twisted kernel. 
Poisson's equation for P a is used to establish versions of Theorem 2.4(iv) in 
the papers [33, 34]. This technique cannot be applied here since we do not 
know if A is bounded or smooth for positive a. 

The proof of Theorem 2.2 is performed in the remainder of this subsection 
through a series of steps. 

We see in Lemma 2.7 that part (i) of Theorem 2.2 follows quickly from 
Theorem 2.4. Recall that G a = [y a I — (R a — s <8> is the potential kernel 
previously defined in (23). 

Lemma 2.7. If the assumptions of Theorem 2.6 hold, then there exists 
ao < such that P a is uniform for each a £ (ao,0). The unique eigenfunction 
fa G Loo satisfying v{f a ) = 1 can be expressed 

(28) f a = G a s. 

Proof. Set G = -F, g = e G and g a = e aG for o£l. Note that unifor- 
mity of Pg a is equivalent to uniformity of P- a for any a. 

Define v = 1, 5 = r — <f> and b = exp(sup xeC . F ( r ) |G(x)|) < oo, so that the 
following bound holds: 

PgV = e~ F <e- 5 v + bI CF(r) . 

Moreover, Proposition 2.5 implies that the minorization condition in (22) 
holds with F replaced by G in the definition of R a . Uniformity of P 9a for 
sufficiently small a > thus follows from Theorem 2.4 and the fact that v is 
bounded. 

The representation (28) follows from Proposition 2.8 of [34] (see also [43]). 

□ 

The difficult part of the proof of Theorem 2.6 is to establish convergence 
of (f>(a) to (j> as a | 0. The proof is based on consideration of the following 
scaled kernel to bound P a . 

For a given small measure v G Mi, let s = e v Ic v ( ro ) with e v > chosen 
so that R> s <8> v. For a fixed e > 0, define the scaled kernel, 

P(x,A) := exp(e v eI Cv (r )(x))P(x,A), x£X,AeB, 
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and the resolvent and potential kernels, 

oo oo 
k=0 k=0 

Lemma 2.8. Suppose that the assumptions of Theorem 2.6 hold. Then, 
there exists e > such that: 

(i) HGsJIoo < oo, 

(ii) ||G-Rs||oo < oo, 

(iii) \\GW\\v <oo. 

Proof. To see (i) we write 

PI = i + {e ee, _ i)l Cv{ro) = 1 + e~\e^ - 1)8. 

Let h = 1 — e~ 1 (e ££v — l)s, and choose e > so this is strictly positive ev- 
erywhere. From the resolvent equation as in (24) we have 

Rl = l + e- 1 {e ££v -l)Rs, 

and hence 

Rh = l = h + e- 1 (e ££ " - l)s. 
On subtracting [s®v\h from both sides we then obtain 

(R — s (8) y)h = h — 5h.s, 

where 5^ = v{h) — e~ 1 (e ££v — 1). By reducing e > we can assume that 
S h >0. 

Exactly as in the proof of Theorem 2.4 we conclude that 

oo 

Gs := ~ s ® v] k s < S h l h, 

k=0 

which establishes the uniform bound in (i). 
Part (ii) follows from (i) and the identity, 

(29) GR = G[R - s ® v] + G[s ® v] = G - I + G[s ® v], 

so that Gi?s < [G + (Gs) ® i/]a = (1 + v(s))Gs. 

To see (iii) we note that we can assume without loss of generality that 
the set G in (V3) is equal to the set Gy(ro) used here by applying [38], 
Theorem 14.2.3. Under this transformation we obtain 

PV < V - W + (e £S - 1)V + {e ££v - l)bl c 
<V-W + b v s, 



SIMULATING LARGE FUNCTIONS 



17 



where b v = e v 1 (e S£v - l)(b + r ). 

From the resolvent equation again this implies the bound 

RV < V - RW + b v Rs, 

and then through familar arguments, GRW < V + b v GRs. This bound com- 
bined with the identity (29) completes the proof of (iii). □ 

With this value of e > fixed in the definition of R, and given ao < from 
Lemma 2.7, we now identify the lower bound a: 

Lemma 2.9. Suppose that the assumptions of Theorem 2.6 hold. Then, 
there exists a € (ao, 0) such that for any a < a < 0, 

K l Pa<Pa<P and f a <f:=Gs, 
where f a is defined in (28). 

PROOF. The bound A" 1 ^ < P a holds since A a > 1. 
To see that P a < P, rewrite this bound as f a (x) < exp(e„dIcy( ro )(x)), or 
on taking logarithms, 

(30) aF(x) <e v eI Cv ( ro )(x), xeX. 

Letting bo = sap x eCo(F) \F( X )\ with Cq(F) = {x £ X : F{x) < 0}, and apply- 
ing the bound (13) gives, 

aF(x) < \a\b Ic (F) < \a\Mc v (r ), a<0. 

This shows that (30) holds for a £ [a, 0) with 

a := max(ao, — EyebQ 1 ). 

The second bound f a < / follows immediately from the first since 

~fa n - 1 {Ra-s®v) n <(R a -s®v) n <(R-s®v) n , n>0. □ 

The previous two lemmas lead to a proof of Theorem 2.6(iii): 

Lemma 2.10. Under the assumptions of Theorem 2.6 the function f' a 
can be expressed 

f'a = G a [la KRJf ~ K^fa, O < a < 0, 

where a is defined in Lemma 2.9. These functions satisfy the uniform bound, 

SUP \\fa\\v<<X)- 
a<a<0 
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Proof. The eigenfunction normalized with v(f a ) = 1 can be expressed 
as (28), where the potential kernel G a exists as a bounded linear operator 
from to by the previous two lemmas. 

For any two values a\, a 2 we then have 

(31) ha ~ fax = G a2 s - G ai s = G a2 [(-/ ai - 7 02 )7 '- (R ai - R a2 )]G ai s. 

Lemmas 2.9 and 2.8 imply that f ai = G ai s is uniformly bounded. Convexity 
of A implies that \~f ai — 7a 2 |( a 2 — ^l)" 1 is uniformly bounded for a < a 2 < 
ai < 0, and it may then be verified using the mean value theorem that for 
some constant bo < oo, 

(a 2 - ai) _1 (|7 ai -7o 2 | + H-Rail - R a2 l\\ w ) <b , a<a 2 <a 1 < 0. 

Applying Lemma 2.8 once more we conclude that 

(a 2 - ai)" 1 !!/^ - f ai \\ v <bi, a<a 2 <a 1 < 0, 

where b± : = bo\\GW\\v < oo. These bounds justify considering the limit a 2 | 
ai in (31) to obtain both the desired expression for f' a and the uniform 
bounds. □ 

Lemma 2.11. Suppose that the assumptions of Theorem 2.6 hold. Then 
f a — > 1 and \ a — > 1 as a] 0. 

Proof. This follows from the uniform bound G a < R, and the formula 
for the limiting potential kernel, 

00 

G = ^[ii-s«>i/] fc . 

k=0 

We have f a = G a s — > Gs as a f 0, and it is known that Gs = 1 since $ is 
Harris recurrent ([43], Theorem 5.1). □ 

Proof of Theorem 2.6. Part (i) is given in Lemma 2.7; (ii) follows 
from Lemmas 2.8 and 2.9; and (iii) is given in Lemma 2.10. 

To see that A is smooth we argue as in [33, 34]: the g.p.e. j a for R a is 
defined for a £ (a, 0) as the unique solution to 

v[Ij-{R a -s®v)]- 1 s = l. 

Hence smoothness follows from the inverse function theorem — see Proposi- 
tion 4.8 of [33]. 

We now show that 4>{a) = A' (a) — > <fi as a | 0. Based on Lemma 2.10 we 
have 



h:=limf' a = GXf -, 

oTO 
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with X := (7o_Ao-)-R/f — Aq_F Moreover, u{f' a ) = for each a, and hence 
from the uniform bounds on f combined with the dominated convergence 
theorem we have v(h) = 0. 

Lemma 2.11 implies that 7o_ = Ao- = 1 so that X = [RIf — Aq_/], and 
also /o- = 1 so that 

= v(h) = uGXl = n(RF - A' _), 

where /J, = vG is an unnormalized invariant measure. In particular fiR = /j, 
so that the expression above implies that //(F) = Ao_/z(X), or equivalently, 



The following result is a weak version of Varadhan's lemma [11]. 



□ 



Proposition 2.12. The following are equivalent for a nonnegative, mea- 
surable function F : X — » E + , and any given initial condition xq G X; 



(i) For some cq > 0, 



'71-1 



- J(co) := limsup - log P Xo \ V F(*(fc)) > nc \ < 0. 

n— »oo 71 



.fc=0 



(ii) For some #o > 0, 



A(6» ) := lim sup - log E Xo 

n— >oo 71 



/n-1 \ 

exp £0 o F($(fc)) 

\fc=0 > 



< oo. 



Proof. The implication (ii) (i) is simply Chernoff's bound. 
Conversely, if (i) holds, then there exists Kq < oo such that 

P X0 \j2 F($(k)) > nc ) < K e- I{co)n , n > 0. 



.k=0 



Consequently, since F is assumed nonnegative- valued, we have for any r > 1, 



'71-1 



>rj-l 



P X0 \J2 F (Hk))>nrc \ <P X J J2 F(<S>(k))>nrc 



.k=0 



(. k=0 



n> 0. 



Fix #o < c o 1 -^( c o)- On multiplying each side of this bound by exp(#oCow) 
we obtain 



{71-1 
^F($(/c)) >nrc 
k=0 



00 con 



<Kie 



(9 c -I(co))nr 



n>0, 
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where K\ := K^e 1 ^^. Integrating both sides from r = 1 to oo we arrive at 
the bound, for each n > and 6$ < Cq 1 /(co), 



/n-l \ 

exp hTMW)) 

\fc=0 J 



<e 



c 9 n 



l + K 



co^o 



1 I(cq) - c 6> 



-I(co)n 



This implies (ii) with MOq) < cq#o- D 



2.3. Proof of Theorem 2.2. Under the assumptions of Theorem 2.6 the 
function A(a) is convex on (a,0). Define the parameters, 

co = lim— A(o); ci = lim— Ma). 

ala da afo aa 

Theorem 2.6(iv) implies that c\ = (p. By convexity we have cq < ci, and this 
inequality is strict if i 7 is nondegenerate since then 

(32) o~\ := —j—^A^a) > 0, o6(o,0). 

Equation (32) follows from [33], Proposition 2.4 (see also [34], Lemma 4.12). 
For c € (cq,c\) the convex dual of A is expressed 

1(c) = max [ca — Ma)] = ca* — A(a*), 
ae(o,0) 

where a* is chosen so that 4-Ma) = c. The function I serves as an LDP rate 
function within this range. 

The LDP (3) for c £ (co,</>) then follows from the multiplicative ergodic 
Theorem 2.3 and standard arguments [4, 11]. 

The proof of the exact LDP (14) is identical to that of the corresponding 
results in [33, 34], where 

1 i. 

J a* i 



a *o- a * 



a* £ (a,0) is again chosen so that j-A(a) = c, f a * is the eigenfunction satis- 
fying the normalization (19) and erf* is defined in (32). The proof amounts 
to verification of the assumptions of [4] , based on the multiplicative ergodic 
Theorem 2.3. 



3. Application to control- variates. In this section we show how the method 
of control- variates can be used to construct a simulator that satisfies an LDP 
for the upper and lower tails even when the assumptions on F in Theorem 2.2 
are violated. 

We begin with a general application of Theorem 2.2. 
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3.1. Control-variates based on a Lyapunov function. Suppose that (V3) 
holds, and consider the function H := V — PV. If 7r(V) < oo, then invari- 
ance of n implies that tt(H) = 0, and hence the function H can be used to 
construct a control-variate as described in the Introduction. 

Recall that the assumption ir(V) < oo means that the chain is W-regular 
of degree 2. In the following result we prefer to avoid this restriction and 
simply assume directly that 7r(|i?|) < oo and that tt(H) = 0. Under this 
assumption, with H := V — PV, define the sequence, 

n-l 

(33) A n :=n~ 1 ^]if($(A;)), n > 0. 

Positive Harris recurrence of implies that A n — > as n — > oo with probabil- 
ity 1 ([38], Theorem 17.0.1). The control-variate for simulation of a function 
F based on the control-variate H and a given parameter 8 G R is given by 

L n (F e ) := L n (F) - 9L n (H) = L n (F) - 9A n , n > 1. 

In Theorem 3.1 we fix a function F G together with constants 0-,0+ 
each strictly greater than ||.F||w. Define 

F^ = F-6^H, F + = F + 6 + H, 

and the pair of estimators 

(34) <j)- = L n (F_), </>+ = L n ( J F+), n>l. 

We denote by A_ (a) , A + (a) the logarithm of the generalized principal eigen- 
value for each of the scaled kernels 

P~ = I e « F _ P, P+ = I e « F+ P, ffl£R. 

The convex duals of the functions {A_, A+} are denoted {/_,/+}. 

Theorem 3.1. Suppose that (V3) holds with W:X^ [l,oo) near-mo- 
notone and V :X — > (0, oo) everywhere finite, and suppose that tt(H) =0, 
where H := V — PV . Then, for a given function F £ L^, there exists Eq > 
such that: 

(i) The lower LDP limit holds using {4>£}-' 

lim n^ 1 log P{(f>t < c} = — J+(c), c G {(j> - Eq, 4>). 

n — >oo 

(ii) The upper LDP limit holds using {4>n} : 

lim n -1 logP{(/>^ > c} = -I-(c), cG (<f>,<j> + £o). 

n — ^oo 

(iii) If in addition F is strongly nonlattice, then (i) and (ii) can be 
strengthened to the corresponding exact LDP limit analogous to (14). 
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Parts (i) and (ii) of Theorem 3.1 combined imply that 

lim rT 1 log P x { [<t> n - 0] € [-£ - 9 + A n , e + 9. A n ] c } 

n — ►oo 

= -mm(I + ((f) - e), I-(<f) + e)) < 0, 0<e<e . 

Proof of Theorem 3.1. By normalization we can assume without loss 
of generality that tt(F) = 0. 

Define W = H + bl c so that by definition of H we have PV = V - W + 
61c an d by (V3) we also have the lower bound, 

W' = (V - PV) + bl c >{W- bl c ) + bl c = W. 

Both F and H belong to the function space . 
We can write 

F- = F — 9^H = F-9-{W' - bl c ), 

which implies that — F_ is near-monotone whenever 6- > since we 

have the explicit bound 

-F- > {9- - \\F\\ W )W - 9_bl c . 

Similarly, F + is near-monotone whenever 9 + > H-FUvy since we can obtain 
the similar lower bound 

F + = F + 9 + (W' -bl c ) 

> -\\F\\ W W + 9 + (W' - bl c ) > (9+ - \\F\\ W )W -9+bl c . 

Moreover, in either case (13) holds for some r > 0, ro < oo, since W S L^. 
Hence the conclusions of Theorem 3.1 follow from Theorem 2.2. □ 

3.2. Application to simulation of queues. We now return to the reflected 
random walk (10) to illustrate the conclusions of Theorem 2.2. 

The assumptions of Proposition 3.2 will be imposed throughout this sec- 
tion. We do not assume that E[|D(fc)| p ] < oo for any p > 2, so the CLT may 
not hold (see [23], [1], Theorem 3.2, Chapter 5 and [38], Chapter 17). More- 
over, $ may not be geometrically ergodic since we do not assume that the 
distribution of D{k) has exponential tails. 

Proposition 3.2. Consider the reflected random walk (10) satisfying 
5:=-E[D(k)] >0, crf)=Var(D)<oo, P{D(k) > 0} > 0. 
Then: 
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(i) The following identity holds: 

(35) PV(x) = V(x)-x + R(x), x£R + , 

where V(x) = 1 + ^5~ l (x 2 + 5x) and R is bounded. Hence (V3) holds with 
W(x) = 1 + \x, xeR + . 

(ii) With To equal to the first return time to the origin, we have for each 
x£R + , 



lim r 1 E rx [ro] =5 1 x, lim r 2 E r 



to— l 

£*(*) 

Lfc=0 



(iii) A unique steady-state distribution it exists satisfying f e^ x n(dx) = oo 
for all (3 > sufficiently large. 

(iv) Let X a denote the g.p.e. for the kernel Pj a with f a = e aF . Then \ a = 
oo for all a> when F(x) = x. 



The proof of Proposition 3.2 is postponed to the end of this section. This 
result combined with Theorem 2.2 implies the LDP for the M/M/l queue: 

Proof of Proposition 1.1. Either of the functions F{x) = x or F(x) = 
e /3x - g near _ mono tone, with = ft(F) < oo. Moreover, for F(x) = x con- 
dition (V3) holds by Proposition 3.2(i), and with F(x) = e^ x for a fixed 
P 6 (0, | log(p)|), the functions V = kF, W = F solve (V3) with k = [1 - 
(ae 13 + (1 - a)e~ /3 )]~ 1 . Hence the one-sided LDP follows from Theorem 2.2. 
The proof of the LDP for positive a, with rate function satisfying 1(a) = 
for a> (J), follows from Proposition 3.2(iv) combined with Proposition 2.12. 
□ 



Although Proposition 1.1 is stated for the M/M/l queue, analogous con- 
clusions hold for the general reflected random walk on R + under the as- 
sumptions of Proposition 3.2. Part (i) asserts that (V3) holds, and hence 
the assumptions of Theorem 3.1 hold with F(x) = x, so that the standard 
estimator satisfies a one-sided LDP. 

We now show how the Lyapunov function V can be used to construct 
a control-variate to obtain both upper and lower error bounds. Note that 
we do not know if ir(V) < oo since we have not assumed that D possesses 
a third moment. Consequently, we must use some other means to establish 
that tt(H) = 0. 

Under (V3) it follows from Proposition 2. 1 (i) that there exists a solution 
F G L^q to Poisson's equation (9). Moreover, it is known that the following 
scaling property holds: 

(36) lim r~ 2 F(rx) = J(x), x G R + , 
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where J is the fluid value function, J(x) = \5 1 x 2 . The function F is convex, 

unique up to an additive constant and can be chosen so that F : X — ► R_|_ . 
The limit result (36) follows from Proposition 5.3 of [6] (see also [35], The- 
orem 16). Convexity is established in [5, 37] for network models. 
Iterating Poisson's equation gives 

n-1 

P n F = F + mf) - pkp , n>l. 

k=0 

It follows from the /-norm ergodic theorem [38] that for each initial condition 

x e R + , 

n-1 

n^P n F (x) = n~ l F(x) + <f) - n" 1 J2 P k F{x) -► 0, n oo. 

k=0 

The quadratic growth and positivity of F imply that we can find e > 
such that F(x) > eV{x) — 1 for all x. Since n~ 1 P n F (x) — > as n — > oo, we 
conclude that also n~ l P n V (x) — ► as n — > oo for each x. 

On setting H = V — PV we see from (35) that the function H can be 
written 

(37) H(x)=x- R{x), xeR + , 

where R : X — > R is bounded. In particular, it has linear growth so that 
7r(|iT|) <oo. Moreover, we have 

n-1 

P n V = V -^P k H, n>l, 

k=0 

and since n~ 1 P n V — > pointwise, we conclude that tt(H) = 0. This justifies 
consideration of Fg = F — OH in an asymptotically unbiased estimator of <j). 

It also follows from the representation (37) that for any given r > 1, the set 
Ci? e (r) is compact whenever 6 < 1, and C-F g (r) is compact whenever 9 > 1. 
This structure allows the application of Theorem 2.2 to obtain confidence 
bounds: 

Proposition 3.3. Consider the reflected random walk (10) with E[D(k) 2 } < 
oo; P{D(k) > 0} > 0; and 6 > 0. Let F(x) = x for x G Z+, fix two parameters 
9 + < 1 and 6- > 1, and define the pair of estimators {</>„", 0„} wia (34) mi/i 

F_ = F - 6L#, F + = F- 6 + H. 

Then, there exists a pair of convex functions {/_,/-)_} on R, and a constant 
Eq > smc/i i/iai /or eac/i initial condition x € X, 

lim n" 1 log P x {0+ < c} = -7+(c) < 0, cE ((f) — E ,<f>), 
n — >oo 

lim n- 1 logP x {^->c} = -/_(c)<0, c £ ((p,<p + e ). 
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Fig. 2. Monte Carlo estimates of cj> := n(F) with F(x) = x for x£R+. The stochastic 
process <J> is reflected random walk (10) with S = —E[D(k)] = 1, and a% = 25. The uncon- 
trolled estimator exhibits large fluctuations around its steady-state mean. The upper and 
lower controlled estimators show less variability, and the bound <j>^ < (f>^ is maintained 
throughout the run. 

Consequently, with {A n } defined in (33), the following limit holds for each 
£€(0,e ): 

lim n^logP^n -(/)}£ {-e + + A n ,e + 9-A n ] c } 

n — >oo 

= -min(/ + (0-e), I_(<£ + e)) < 0. 

Proof. The assumption that P{D(k) > 0} > is used to deduce that 
7r has support outside of the origin. For e > sufficiently small the set 
C = [0, e] is small, and satisfies the one-step minorization condition: for some 
5 > 0, P(x, •) >5v(-) with v the point-mass at the origin. It follows that the 
functions {F, F + } are each nondegenerate. 

The conclusions then follow from Theorem 3.1. □ 

An illustration of these controlled estimators is provided in Figure 2. The 
sequence D was chosen of the form D{k) = A(k) — S(k), where A and S 
are mutually independent, i.i.d. sequences. Given nonnegative parameters 
fj,, a, k we set 

P{S(k) = (1 + k)ii} = 1 - P{S(k) = 0} = (1 + k)- 1 , 
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Fig. 3. The plot at left shows the same simulation as shown in Figure 2, with the time 
horizon increased to T = 20,000. The plot at right shows the two controlled estimators 
along with the uncontrolled estimator when the variance is increased to a 2 D — 125. In each 
case the estimates obtained from the standard Monte Carlo estimator are significantly 
larger than those obtained using the controlled estimator, and the bound <f>~ < tf>^ again 
holds for all large n. 



P{A(k) = (1 + k)o} = 1 - P{A{k) = 0} = (1 + k)~ 1 . 

Consequently, we have E[D(fc)] = EL4(/c)] — E[5(fc)] = —(ji — a), and a 2 D = 
o\ + trig = (fi 2 + a 2 )n. The simulation results shown in Figure 2 used n = 4, 
a = 3 and k = 2, so that 5 = 1 and a 2 D = 25. 

The control- variate parameter values 6- = 1.05 and 0+ = 1 were used 
in the construction of {(/>",(/>+}. While this value of 6 + violates the strict 
inequality 9 + > 1 required in Proposition 3.3, we have in this case 

F+(x) = x - 0+(x - R(x)) = R(x),x e R+. 

The function R has mean zero and satisfies (13) when D has bounded sup- 
port [or just a (2 + e)-moment], so Theorem 2.2 implies that the lower LDP 
does hold using {4>n} when 9 + = 1. 

The plot at left in Figure 3 illustrates the simulation shown previously in 
Figure 2, with the time horizon increased to T = 20,000. The plot at right 
shows the controlled and uncontrolled estimators with k = 5, and hence 
erf, = 125. The bounds (j>~ < 4>n < $>n hold for all large n even though all 
three estimators are asymptotically unbiased. 

PROOF of Proposition 3.2. The function R has the explicit form, 
R(x) = \5- l a 2 D - ±5- l E[{(x + D{k)f + S(x + D(k))}l(x < -D(k))], 

xeX. 

Under the second-moment assumption on D{k) the function R is bounded, 
since by Chebyshev's inequality, 

E[x 2 I(x < —D(k))] < x 2 P{\D{k)\ >x}< E[D{k) 2 }, x G R+. 
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The second limit in (ii) follows from Proposition 5.3 of [6] [it can also be 
proved using (35) combined with the comparison theorem]. The proof of the 
first limit is similar and is omitted (see [7] for similar results). 

The existence of tt satisfying ir(F) < oo with F(x) = x follows from (35) 
and Proposition 2.1. On the interior of the set of (5 £ R satisfying 7r(e^ F ) < oo 
we have by stationarity, 

ir( e P F ) = E n [exp([<S>(k) + D(k + 1)]+)] > E n [exp(<P(k) + D(k + 1))]. 

Hence by independence of $(fc) and D(k + 1), the log moment generating 
functions for 3> and D satisfy the bound 

M(J3) := log(7r(e^ F )) > M(J3) + M D {(3). 

It follows that M{(5) = oo when Mo{(3) > 0, and this holds for large enough 
(3 > under the assumption that P{D(k) > 0}. 

We now prove (iv). Suppose that in fact Xq < oo for some positive #o- It 
then follows that for Ao > A(#q) := log(Ae ), 

oo (n— 1 \ 

£e exp(X;^(*;)-Ao l{*(n) = 0} < oo. 

n=0 L \fc=0 / 

Define, with tq equal to the first return time to the origin, 



h(x) := E a 



/ro-l N 

exp 0o$(k)-A o 

\ k=0 / 



Then, from the previous bound, 

Mo) = E E o' 



i} 



< oo. 



exp [J2^{k)-A I{r 

n=0 L \fc=0 / 

We next demonstrate that h must be 7r-integrable. 
For any x S R + we have 

h(x) = exp(6 F(x) - A )E a .[h($i)I$ 1 ^ + I# 1= o], 

from which it follows that the following identity holds for a bounded function 
b : M+ -> R: 

exp(9 F(x) - A )Ph = exp(b (x))h(x), x £ R+. 

This is a version of the drift condition (DV3) of [33, 34], which is far stronger 
than the drift condition (V3) of [38]. The comparison theorem of [38] implies 
that 7r(/i) < oo. 

Next we obtain a lower bound on h using Jensen's inequality: 



log h(x) > E x 



TO - 1 

E Po*(fc) - Ao] 

.k=0 
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Applying Proposition 3.2 (ii) , we conclude that the right-hand side is bounded 
from below by a quadratic function of x, giving a bound of the form, for 
some constant b < oo, 

logh(x) > \5~ l x 2 -b(x + 1), x£t + . 

This bound combined with Proposition 3.2 (iii) implies that vr(/i) = oo, which 
is a contradiction. □ 

4. Conclusions. We have seen that it is possible to establish strong LDP 
asymptotics for unbounded functions even when the assumptions of Donsker 
and Varadhan [46, 47] or the weaker geometric ergodicity assumption are 
violated. We are currently developing worst-case bounds when the statistics 
of the process are only partially known [32, 44], and we are also searching 
for ways of identifying explicit bounds on the rate function. 

We are eager to develop these simulation techniques to better understand 
the value of the application of multiple control-variates for improved confi- 
dence bounds. Applications to network models are also considered in current 
research. 
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